01 Data Visualization

Author
Affiliation

Dr. Devan Becker

Wilfrid Laurier University


This chapter accompanies 1 Data Visualization in R4DS.

1 Chapter Follow-Along

Use the following code cell to follow along while reading the chapter. (You are encouraged to do this in RStudio instead!)

These supplementary notes use something called webr, which allows R to run in your browser (no installation needed)!

2 Exercises

1.2 First Steps

  1. How many rows are in penguins? How many columns?
  1. What does the bill_depth_mm variable in the penguins data frame describe? Read the help for ?penguins to find out.
  1. Make a scatterplot of bill_depth_mm vs. bill_length_mm. That is, make a scatterplot with bill_depth_mm on the y-axis and bill_length_mm on the x-axis. Describe the relationship between these two variables.
  1. What happens if you make a scatterplot of species vs. bill_depth_mm? What might be a better choice of geom?
  1. Why does the following give an error and how would you fix it?
  1. What does the na.rm argument do in geom_point()? What is the default value of the argument? Create a scatterplot where you successfully use this argument set to TRUE.
  1. Add the following caption to the plot you made in the previous exercise: “Data come from the palmerpenguins package.” Hint: Take a look at the documentation for labs().
  1. Recreate the following visualization. What aesthetic should bill_depth_mm be mapped to? And should it be mapped at the global level or at the geom level?
  1. Run this code in your head and predict what the output will look like. Then, run the code in R and check your predictions. (Note: ggplot2 accepts either “color” or “colour” as an argument. I am changing all instances to the Canadian spelling.)
  1. Will these two graphs look different? Why/why not?

1.4 Visualizing distributions

  1. Make a bar plot of species of penguins, where you instead assign species to the y aesthetic. How is this plot different?
  1. How are the following two plots different? Which aesthetic, colour or fill, is more useful for changing the colour of bars?
  1. Make a histogram of the carat variable in the diamonds dataset that is available when you load the ggplot2 package. Experiment with different binwidths. What binwidth reveals the most interesting patterns?

1.5.5 Visualizing Relationships

  1. The mpg data frame that is bundled with the ggplot2 package contains 234 observations collected by the US Environmental Protection Agency on 38 car models. Which variables in mpg are categorical? Which variables are numerical? (Hint: Type ?mpg to read the documentation for the dataset.) How can you see this information when you run mpg?
  1. Make a scatterplot of hwy vs. displ using the mpg data frame. Next, map a third, numerical variable to colour, then size, then both colour and size, then shape. How do these aesthetics behave differently for categorical vs. numerical variables?
  2. In the scatterplot of hwy vs. displ, what happens if you map a third variable to linewidth?
  3. What happens if you map the same variable to multiple aesthetics?
  1. Make a scatterplot of bill_depth_mm vs. bill_length_mm and colour the points by species. What does adding colouring by species reveal about the relationship between these two variables? What about faceting by species?
  1. Why does the following yield two separate legends? How would you fix it to combine the two legends?
  1. Create the two following stacked bar plots. Which question can you answer with the first one? Which question can you answer with the second one?